{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Epsilla Vector Store\n",
    "In this notebook we are going to show how to use [Epsilla](https://www.epsilla.com/) to perform vector searches in LlamaIndex."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a prerequisite, you need to have a running Epsilla vector database (for example, through our docker image), and install the ``pyepsilla`` package.\n",
    "View full docs at [docs](https://epsilla-inc.gitbook.io/epsilladb/quick-start)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "!pip/pip3 install pyepsilla"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import SimpleDirectoryReader, Document, StorageContext\n",
    "from llama_index.indices.vector_store import VectorStoreIndex\n",
    "from llama_index.vector_stores import EpsillaVectorStore\n",
    "import textwrap"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "Lets first begin by adding the openai api key. It will be used to created embeddings for the documents loaded into the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import getpass\n",
    "\n",
    "OPENAI_API_KEY = getpass.getpass(\"OpenAI API Key:\")\n",
    "openai.api_key = OPENAI_API_KEY"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load documents stored in the `/data/paul_graham` folder using the SimpleDirectoryReader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total documents: 1\n",
      "First document, id: ac7f23f0-ce15-4d94-a0a2-5020fa87df61\n",
      "First document, hash: 4c702b4df575421e1d1af4b1fd50511b226e0c9863dbfffeccb8b689b8448f35\n"
     ]
    }
   ],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"../data/paul_graham/\").load_data()\n",
    "print(f\"Total documents: {len(documents)}\")\n",
    "print(f\"First document, id: {documents[0].doc_id}\")\n",
    "print(f\"First document, hash: {documents[0].hash}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the index\n",
    "Here we create an index backed by Epsilla using the documents loaded previously. EpsillaVectorStore takes a few arguments.\n",
    "- client (Any): Epsilla client to connect to.\n",
    "\n",
    "- collection_name (str, optional): Which collection to use. Defaults to \"llama_collection\".\n",
    "- db_path (str, optional): The path where the database will be persisted. Defaults to \"/tmp/langchain-epsilla\".\n",
    "- db_name (str, optional): Give a name to the loaded database. Defaults to \"langchain_store\".\n",
    "- dimension (int, optional): The dimension of the embeddings. If not provided, collection creation will be done on first insert. Defaults to None.\n",
    "- overwrite (bool, optional): Whether to overwrite existing collection with same name. Defaults to False.\n",
    "\n",
    "Epsilla vectordb is running with default host \"localhost\" and port \"8888\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[INFO] Connected to localhost:8888 successfully.\n"
     ]
    }
   ],
   "source": [
    "# Create an index over the documnts\n",
    "from pyepsilla import vectordb\n",
    "\n",
    "client = vectordb.Client()\n",
    "vector_store = EpsillaVectorStore(client=client, db_path=\"/tmp/llamastore\")\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Query the data\n",
    "Now we have our document stored in the index, we can ask questions against the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author of the given context information is Paul Graham.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Who is the author?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author learned about AI through various sources. One source was a novel called \"The Moon is a\n",
      "Harsh Mistress\" by Heinlein, which featured an intelligent computer called Mike. Another source was\n",
      "a PBS documentary that showed Terry Winograd using SHRDLU, a program that could understand natural\n",
      "language. These experiences sparked the author's interest in AI and motivated them to start learning\n",
      "about it, including teaching themselves Lisp, which was regarded as the language of AI at the time.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"How did the author learn about AI?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's try to overwrite the previous data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There is no information provided about the author in the given context.\n"
     ]
    }
   ],
   "source": [
    "vector_store = EpsillaVectorStore(client=client, overwrite=True)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "single_doc = Document(text=\"Epsilla is the vector database we are using.\")\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    [single_doc],\n",
    "    storage_context=storage_context,\n",
    ")\n",
    "\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Who is the author?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epsilla is the vector database being used.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What vector database is being used?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's add more data to existing collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author of the given context information is Paul Graham.\n"
     ]
    }
   ],
   "source": [
    "vector_store = EpsillaVectorStore(client=client, overwrite=False)\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)\n",
    "for doc in documents:\n",
    "    index.insert(document=doc)\n",
    "\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Who is the author?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epsilla is the vector database being used.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What vector database is being used?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
